For a project I was working on I needed to determine the distance between multiple latitude longitude coordinates. After doing some googling it was odvious that Haversine Distance is what I should use for this.
What is Haversine Distance
According to Wikipedia "Haversine Formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes". The formula for calculating Haversine distance is:
- a = sin²(Δφ/2) + cos φ1 ⋅ cos φ2 ⋅ sin²(Δλ/2)
- c = 2 ⋅ atan2( √a, √(1−a) )
- d = R ⋅ c
In formula above φ is the latitude and λ is longitude w/ R representing the Radius of the earth(a sphere, hush flat earthers!).
Then d is representing the haversine distance.
To gain a better understanding of the formula I wanted to implement it in Python. I'm no mathematician but I found this article by Simon Kettle easy to follow for understanding how to implement this in Python.
"""
Utility class to use python to calculate the haversine distance between two longitude
and latitude points.
"""
from math import radians, sin, cos, atan2, sqrt
# Radius of earth in meters
RADIUS_OF_EARTH = 6371000
def calculate_haversine_distince(lat_long_start:list, lat_long_end:list):
"""Calculate the haversine distance."""
# Taken from here
# https://community.esri.com/t5/coordinate-reference-systems-blog/distance-on-a-sphere-the-haversine-formula/ba-p/902128
# Direct Unpack the
lat1, lon1 = lat_long_start
lat2, lon2 = lat_long_end
# Radians
phi_1 = radians(lat1)
phi_2 = radians(lat2)
delta_phi = radians(lat2 - lat1)
delta_lambda = radians(lon2 - lon1)
a = sin(delta_phi / 2.0) ** 2 + cos(phi_1) * cos(phi_2) * sin(delta_lambda / 2.0) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
# Output distance in meters
meters = RADIUS_OF_EARTH * c
return meters
You can see the source code for this here.
Implementing the formula led to greater understanding and some new insights for me:
- Radians above we're converting degrees(latitude and longitude) to radians because this is the standard unit of angular measure used in the Haversine formula.
- 6,371,000 is the mean earth radius in meters and is used to get to meters in the formula above.
Testing
Implementing the raw formula in Python is great but I also need to ensure that the output from the formula above is correct. In order to do that I'm going to use Scikit-learn implementation of Haversine Distance which can be seen here .
"""
Test for haversine distance.
"""
import logging as log
from math import radians
import pytest
from sklearn.metrics.pairwise import haversine_distances
from python_examples.math.haversine_distance import calculate_haversine_distince, RADIUS_OF_EARTH
@pytest.mark.parametrize("lat_long_origin,lat_long_destination", [
# Scikit Learn example Ezeiza Airport -> Charles de Gaulle Airport
pytest.param(
[-34.83333, -58.5166646],
[49.0083899664, 2.53844117956]
)
])
def test_calculate_haversince_distance(lat_long_origin:list, lat_long_destination:list):
"""Test haversine_distance against """
log.info(f"{lat_long_origin} - {lat_long_destination}")
origin_in_radians = [radians(_) for _ in lat_long_origin]
destination_in_radians = [radians(_) for _ in lat_long_destination]
distance_matrix_result = haversine_distances([origin_in_radians, destination_in_radians])
km_distance_matrix_result = distance_matrix_result * RADIUS_OF_EARTH / 1000
haversine_distance = calculate_haversine_distince(lat_long_origin, lat_long_destination)
km_haversine_distance = round(haversine_distance / 1000, 8)
sklearn_distance = km_distance_matrix_result[0][1]
log.info(f"Raw Calc={km_haversine_distance}, Sklearn Calc={sklearn_distance}")
# Rounding both to 8 digits of accuracy...
assert km_haversine_distance == round(km_distance_matrix_result[0][1], 8)
You can see the source for this here.
Insights from doing the testing:
- Outside of greater understanding and not wanting to bring scikit-learn into your dependencies due to it's size there's no real need to implement Haversine Distance in python on your own. Scikit learn returns a distance matrix and can natively handle N latitude and longitude points finding the distance between all of them.
- Distance Matrix is a useful data structure for seeing the distance between multiple points.
- Getting the shortest distance from the 2D array returned by scikit-learn can be achieved by doing the following:
def test_shortest_distance():
# Lincoln Memorial Latlong
lincoln_memorial = [38.889248, -77.050636]
# Ducinni's off U St
ducinnis_pizza = [38.91706701509112, -77.04118715785616]
# Admo Jumbo Slice
jumbo_slice_pizza = [38.92105748065034, -77.04170211776945]
# Manny Olgas Near U
manny_olgas = [38.91543818061708, -77.03174726038766]
# Based off Haversine which is closest to Lincoln Memorial
dataset = [
{'name': 'Lincoln Memorial', 'geocode': lincoln_memorial},
{'name': "Ducinni's Pizza", 'geocode': ducinnis_pizza},
{'name': 'Jumbo Slice Pizza', 'geocode': jumbo_slice_pizza},
{'name': "Manny Olga's", 'geocode': manny_olgas}
]
# Put all these in radians
radian_distances = list(map(lambda x: [radians(_) for _ in x['geocode']], dataset))
# Calculate the Haversine Distances
distance_matrix_result = haversine_distances(radian_distances)
# Pull out the first Entry in the 2D array
km_distance_matrix_result = distance_matrix_result * RADIUS_OF_EARTH / 1000
log.info(f"\nDistance Matrix in Kilometers:\n {km_distance_matrix_result}")
# Get the index of the minimum distance for the lincoln memorial
np_array = np.array(km_distance_matrix_result[0][1:])
idx_min = np.argmin(np_array)
# Closest Should be Ducinnis
assert idx_min + 1 == 1
log.info(f"\nClosest Jumbo Slice spot to Lincoln Memorial by Haversine Distance is {dataset[idx_min + 1]['name']}")
Code can be seen here.
With example output below:

From all this I've learned Haversine Distance is a useful formula for calculating distances between geo-coordinates. In general I'd just use scikit-learn in order to get these distances but it's always great to have some base understanding of the formula you're using.